Optimizing Multiple Spaced Seeds for Homology Search

نویسندگان

  • Jinbo Xu
  • Daniel G. Brown
  • Ming Li
  • Bin Ma
چکیده

Optimized spaced seeds improve sensitivity and specificity in local homology search. Several authors have shown that multiple seeds can have better sensitivity and specificity than single seeds. We describe a linear programming (LP)-based algorithm to optimize a set of seeds. Theoretically, our algorithm offers a performance guarantee: the sensitivity of a chosen seed set is at least 70% of what can be achieved, in most reasonable models of homologous sequences. In practice, our algorithm generates a solution which is at least 90% of the optimal. Our method not only achieves performance better than or comparable to that of a greedy algorithm, but also gives this area a mathematical foundation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiple spaced seeds for homology search

MOTIVATION Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. The introduction of optimal spaced seeds in PatternHunter has increased both the sensitivity and the speed of homology search, and it has been adopted by many alignment programs such as BLAST. With the further improvement provided by multiple spaced seeds in PatternHunterII, Smi...

متن کامل

Fast Computation of Good Multiple Spaced Seeds

Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. A significant fraction of computing power in the world is dedicated to performing such tasks. The introduction of optimal spaced seeds by Ma et al. has increased both the sensitivity and the speed of homology search and it has been adopted by many alignment programs such as BLAST. With the...

متن کامل

Sensitivity analysis and efficient method for identifying optimal spaced seeds

The novel introduction of spaced seed idea in the filtration stage of sequence comparison by Ma et al. (Bioinformatics 18 (2002) 440) has greatly increased the sensitivity of homology search without compromising the speed of search. Finding the optimal spaced seeds is of great importance both theoretically and in designing better search tool for sequence comparison. In this paper, we study the ...

متن کامل

On the complexity of the spaced seeds

Optimal spaced seeds were introduced by the theoretical computer science community to bioinformatics to effectively increase homology search sensitivity. These seeds are serving many homology queries daily. However the computational complexity of finding the optimal spaced seeds remains to be open. In this paper, we prove that computing hit probability of a spaced seed in a uniform homology reg...

متن کامل

Seed-Set Construction by Equi-entropy Partitioning for Efficient and Sensitive Short-Read Mapping

Spaced seeds have been shown to be superior to continuous seeds for efficient and sensitive homology search based on the seedand-extend paradigm. Much the same is true in genome mapping of high-throughput short-read data. However, a highly sensitive search with multiple spaced patterns often requires the use of a great amount of index data. We propose a novel seed-set construction method for ef...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of computational biology : a journal of computational molecular cell biology

دوره 13 7  شماره 

صفحات  -

تاریخ انتشار 2004